Human-Content and Gesture-Event Video Coding
نویسنده
چکیده
Currently, bandwidth limitations pose a major challenge for delivering high-quality multimedia information to users. In this research, we aim to provide a better compression of human-centered video sequences such as lectures, monologues, and presentations. Based on the idea that people pay more attention to face and hand regions in videos containing people speaking, our approach encodes those regions with higher resolution than the remaining image. Using computer vision techniques, we segment and track the subject’s face and hands. The face region is assigned the highest salience value. Gesture analysis of the hands is then used to encode important gesture events at high salience and non-gestures at a lower value. We demonstrate the differential video coder with the production of three highly-salient, low-bandwidth video monologue sequences.
منابع مشابه
Hand Gesture Recognition from RGB-D Data using 2D and 3D Convolutional Neural Networks: a comparative study
Despite considerable enhances in recognizing hand gestures from still images, there are still many challenges in the classification of hand gestures in videos. The latter comes with more challenges, including higher computational complexity and arduous task of representing temporal features. Hand movement dynamics, represented by temporal features, have to be extracted by analyzing the total fr...
متن کاملRecognition of Visual Events using Spatio-Temporal Information of the Video Signal
Recognition of visual events as a video analysis task has become popular in machine learning community. While the traditional approaches for detection of video events have been used for a long time, the recently evolved deep learning based methods have revolutionized this area. They have enabled event recognition systems to achieve detection rates which were not reachable by traditional approac...
متن کاملGesture and Speech for Video Content Navigation
This article describes ongoing research in the use computer vision gesture and speech recognition techniques as a natural interface for video content navigation, and the design of a navigation and browsing system that caters to these natural means of computer-human interaction. For consumer applications, video content navigation presents two challenges: (1) how to parse and summarize multiple v...
متن کاملTraffic Scene Analysis using Hierarchical Sparse Topical Coding
Analyzing motion patterns in traffic videos can be exploited directly to generate high-level descriptions of the video contents. Such descriptions may further be employed in different traffic applications such as traffic phase detection and abnormal event detection. One of the most recent and successful unsupervised methods for complex traffic scene analysis is based on topic models. In this pa...
متن کاملWeb-Powered Virtual Site Exploration Based on Augmented 360 Degree Video via Gesture-Based Interaction
Physically attending an event or visiting a venue might not always be practically feasible (e.g., due to travel overhead). This article presents a system that enables users to remotely navigate in and interact with a real-world site using 360° video as primary content format. To showcase the system, a demonstrator has been built that affords virtual exploration of a Belgian museum. The system b...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002